Search CORE

124 research outputs found

Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model

Author: Du Xinzhe
Li Chen
Wang Zulin
Xu Mai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/07/2018
Field of study

Omnidirectional video enables spherical stimuli with the

360 \times 180^ \circ

viewing range. Meanwhile, only the viewport region of omnidirectional video can be seen by the observer through head movement (HM), and an even smaller region within the viewport can be clearly perceived through eye movement (EM). Thus, the subjective quality of omnidirectional video may be correlated with HM and EM of human behavior. To fill in the gap between subjective quality and human behavior, this paper proposes a large-scale visual quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset provides not only the subjective quality scores of sequences but also the HM and EM data of subjects. By mining our dataset, we find that the subjective quality of omnidirectional video is indeed related to HM and EM. Hence, we develop a deep learning model, which embeds HM and EM, for objective VQA on omnidirectional video. Experimental results show that our model significantly improves the state-of-the-art performance of VQA on omnidirectional video.Comment: Accepted by ACM MM 201

arXiv.org e-Print Archive

Crossref

Experimental and CFD Study of Flow Phenomenon in Flowrate-amplified Flotation Element

Author: Xin Li
Xinzhe Wang
Publication venue: Technische Universität Dresden
Publication date: 01/01/2016
Field of study

Focusing on reducing the air consumption of an air flotation rail system, a flowrate-amplified flotation element was recently developed. This new flotation element ulitises the rotational flow to intake extra air via an intake hole, and thus, effectively improves the flotation height. Compared to a conventional flotation element, the flowrate-amplified flotation element can reduce air consumption by approximately 50% for the same load and flotation height. To gain an understanding of the flow phenomenon in the flowrate-amplified flotation element, experiments and CFD simulations are conducted in this study. Based on the results, we found that the flowrate-amplified flotation element could take a part of the kinetic energy of the rotating air to suck in extra air. The intake hole greatly affects the pressure field and velocity field of the flotation element. Additionally, the effects of the variant gap height and supplied flow rate were also discussed. The results indicate that the pressure distribution decreases as the gap height increases and increases as the supplied flow rate increases

Technische Universität Dresden: Qucosa

Toward Linearizability Testing for Multi-Word Persistent Synchronization Primitives

Author: Cepeda Diego
Chowdhury Sakib
Golab Wojciech
Li Nan
Lopez Raphael
Wang Xinzhe
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Principles of Distributed Systems (OPODIS 2019)
Publication date: 01/01/2020
Field of study

Persistent memory makes it possible to recover in-memory data structures following a failure instead of rebuilding them from state saved in slow secondary storage. Implementing such recoverable data structures correctly is challenging as their underlying algorithms must deal with both parallelism and failures, which makes them especially susceptible to programming errors. Traditional proofs of correctness should therefore be combined with other methods, such as model checking or software testing, to minimize the likelihood of uncaught defects. This research focuses specifically on the algorithmic principles of software testing, particularly linearizability analysis, for multi-word persistent synchronization primitives such as conditional swap operations. We describe an efficient decision procedure for linearizability in this context, and discuss its practical applications in detecting previously-unknown bugs in implementations of multi-word persistent primitives

Dagstuhl Research Online Publication Server

BInGo: Bayesian Intrinsic Groupwise Registration via Explicit Hierarchical Disentanglement

Author: Luo Xinzhe
Wang Xin
Zhuang Xiahai
Publication venue
Publication date: 11/12/2022
Field of study

Multimodal groupwise registration aligns internal structures in a group of medical images. Current approaches to this problem involve developing similarity measures over the joint intensity profile of all images, which may be computationally prohibitive for large image groups and unstable under various conditions. To tackle these issues, we propose BInGo, a general unsupervised hierarchical Bayesian framework based on deep learning, to learn intrinsic structural representations to measure the similarity of multimodal images. Particularly, a variational auto-encoder with a novel posterior is proposed, which facilitates the disentanglement learning of structural representations and spatial transformations, and characterizes the imaging process from the common structure with shape transition and appearance variation. Notably, BInGo is scalable to learn from small groups, whereas being tested for large-scale groupwise registration, thus significantly reducing computational costs. We compared BInGo with five iterative or deep learning methods on three public intrasubject and intersubject datasets, i.e. BraTS, MS-CMR of the heart, and Learn2Reg abdomen MR-CT, and demonstrated its superior accuracy and computational efficiency, even for very large group sizes (e.g., over 1300 2D images from MS-CMR in each group)

arXiv.org e-Print Archive

DPATD: Dual-Phase Audio Transformer for Denoising

Author: Li Jialu
Li Junhui
Wang Pu
Wang Xinzhe
Zhang Youshan
Publication venue
Publication date: 30/10/2023
Field of study

Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods. However, time-domain speech enhancement systems typically receive input audio sequences consisting of a large number of time steps, making it challenging to model extremely long sequences and train models to perform adequately. In this paper, we utilize smaller audio chunks as input to achieve efficient utilization of audio information to address the above challenges. We propose a dual-phase audio transformer for denoising (DPATD), a novel model to organize transformer layers in a deep structure to learn clean audio sequences for denoising. DPATD splits the audio input into smaller chunks, where the input length can be proportional to the square root of the original sequence length. Our memory-compressed explainable attention is efficient and converges faster compared to the frequently used self-attention module. Extensive experiments demonstrate that our model outperforms state-of-the-art methods.Comment: IEEE DD

arXiv.org e-Print Archive